Chapter 16: 
Connected Speech in Advanced-Level 
Phonology 


This is the authors’ version. 


published in 


The Handbook of 
Advanced Proficiency 
in Second Language 
Acquisition 


Edited by 
Paul A. Malovrh and 
Alessandro G. Benati 


WILEY Blackwell 


For the original chapter please visit: 


https://onlinelibrary.wiley.com/action/showCitFormats?doi=10.1002%2F9781119261650.ch1 


6 


Please cite this chapter as follows: 


Gokgoz-Kurt, B., & Holt, D.E. (2018). Connected Speech in Advanced-Level Phonology. In 
P.A. Malovrh & A.G. Benati (Eds.), The Handbook of Advanced Proficiency in Second 
Language Acquisition (pp. 304-322). Wiley-Blackwell. 


https://doi.org/10.1002/9781119261650.ch16 


16 Connected Speech 
in Advanced-Level 
Phonology 


BURCU GOKGOZ-KURT' AND 
D. ERIC HOLT? 


'Dumlupinar University, Turkey 
? University of South Carolina 


Connected speech and the advanced L2 learner 


Understanding the spoken language poses a challenge for most second language 
(L2) learners, especially because it requires L2 learners to determine the bound- 
aries of word sequences uttered in context. As Stephen Pinker (1995, pp. 159-160, 
cited in Alameen & Levis, 2015, p.160) notes: 


In speech sound waves, one word runs into the next seamlessly; there are no little 
silences between spoken words the way there are white spaces between written 
words. We simply hallucinate word boundaries when we reach the edge of a stretch 
of sound that matches some entry in our mental dictionary. This becomes apparent 
when we listen to speech in a foreign language: it is impossible to tell where one word 
ends and the next begins. 


In this chapter we will flesh out the nature and difficulty for the learner of 
connected speech. In addition to describing connected speech processes and dis- 
cussing notions of advancedness, we will look at issues of proficiency, learner context, 
perceptual saliency, and others, as well as particular challenges learners face, the 
effects of training, and the effects of individual differences; we likewise touch on 
pedagogical perspectives, and finally conclude with directions for future research. 

Regarding the fluid nature of speech, a Spanish example will illustrate the 
points made by Pinker: Los alumnos estan en el aula (‘the students are in the 
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classroom’) is pronounced as Lo-s|a-/um-no-s|es-ta-n|e-n|e-l|au-la, where no 

word’s boundaries line up with both beginning and end of syllables as pronounced. 
If a learner is expecting the beginning of a word to coincide with the 

beginning of a syllable, and the end of the word to coincide with the end of a syllable 
(even though a word may be made up of more than one syllable), this type of 

linked speech greatly degrades perception, and word identification may be hampered 
or thwarted. However, armed with the knowledge that Spanish prefers 

open/CV syllables and disprefers vowel-initial syllables when possible, the 

learner in perception can ‘undo’ the syllable linking to achieve comprehension. 
However effortlessly it may seem to be produced by the interlocutor, deciphering 
these word boundaries in an L2 is not an easy task at all. This is mainly because 
word boundaries may disappear or their position may shift, usually as a result of 

the integration of words into each other during articulation, thus causing a loss of 
“phonetic identity” in words (Kohler, 1990). This process in which speakers “draw 
[the sounds] together” and make the word boundaries smooth is referred to as 
connected speech, e.g., in English, pronouncing ‘want to’ as ‘wanna’ [wano], and 
‘going to’ as ‘gonna’ [gana] (Clarey & Dixson, 1963). These shifts in word boundaries 
and variability in phonetic forms are triggered by various factors, such as the ways 
of articulation, morphological, lexical, and syntactic properties, sentence stress, 

and more importantly, the speaking styles as stipulated by the communicative 
context (Kohler, 1990). 


Given the subtlety in producing these processes and the changes they bring 

about, it should not be surprising that a majority of L2 learners find it challenging 
to decode the commonly used connected speech forms in an L2, which may cause 
them to have breakdowns in communication in real life. While in low-proficiency 
L2 learners this difficulty in decoding may be more likely to stem from gaps in 
interlanguage syntax, morphology, or lexicon, among others, they are less likely to 
be the cause in highly proficient L2 learners, who may be more concerned about 
discerning and addressing style differences in diverse communication situations 
(see Rubin, 1994). That being said, it should be noted that the proficiency level of 
an L2 speaker may serve as an important factor determining the type of connected 
speech forms they are ready to learn. (See more on this in the section ‘Training L2 
learners to perceive and produce connected speech processes. ’) 

Regardless of proficiency level, though, depending on the learning context, 

there may always be a gap between the language used in the classroom and that 
used outside. Therefore, higher proficiency in an L2 may not necessarily mean 
more exposure to connected speech forms, especially in foreign language contexts 
(Ur, 1987). Learners who are usually exposed to fully articulated ‘teacher talk’ during 
their L2 learning experiences in classroom contexts find it frustrating when 
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they cannot understand authentic conversations among native speakers and 

highly proficient non-native speakers of a language. This is especially true for 

those who learn L2 vocabulary and grammar in their home countries in a rather 
decontextualized way, including those who are highly proficient in written language 
but less so in spoken language. Hence, upon arrival at the host country, they 

usually have a “rude awakening” (Ur, 1987, p. 10) and claim that native speakers 
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talk “too fast” (Gilbert, 1995, p. 97). This may originally be attributed to the lack of 
perceptual saliency of connected speech forms, which requires L2 listeners to 
allocate more attentional resources to be able to recognize these forms in spoken 
language, and this ability improves as L2 learners develop higher levels of attention 
control as proficiency goes up (Segalowitz & Frenkiel-Fishman, 2005). Given the 
difficulty of noticing and deciphering these forms, their use by L2 speakers in 
spoken utterances seems to be as challenging as perceiving them, although views 
on this might vary depending on the theoretical framework and other individual 
differences discussed further below. 


Despite such challenges in teaching and learning these forms, a knowledge of 
connected speech forms (in terms of a general understanding of relevant structures 
and processes) 1s crucial for maintaining effective communication skills in an L2 
and fosters improved performance. This is mainly because it provides an L2 learner 
with the necessary skills to understand the subtleties of the authentic use of the 
target language in spoken language, which in turn helps learners to improve their 
ability to transition smoothly while listening to and speaking with various speakers 
in a diversity of contexts. 


Describing connected speech processes 


The term connected speech is used to refer to processes such as reductions, minimizations, 
or full eliminations (Brown & Kondo-Brown, 2006) occurring across word boundaries 
following certain language-specific phonotactic rules (Joyce, 2013). This type 

of speech has also been referred to as reductions, reduced forms of speech, sandhi variation, 
or weak forms (Brown & Kondo-Brown, 2006, p. 5; Ito, 2006, p. 17). Connected 

speech processes (CSPs) may involve changes, additions, or eliminations to sounds 

and sound sequences, which occur due to various linguistic as well as communicative 

and pragmatic factors. For example, in English, stress and intonation patterns 

play a significant role in determining which sounds or sound sequences are 

to be eliminated or modified. While function words, which usually do not bear 

stress, undergo deletion, content words and their stress-bearing syllables are not 

usually eliminated in connected speech processes. The word ‘and’ in the phrase 

‘now and then’ is pronounced as [on] because ‘and’ is a function word and thus it 
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is pronounced in its weak form in connected speech. Similarly, in German, function 
words including certain types of pronouns, conjunctions, prepositions, and 
auxiliary verbs are reduced when unstressed. Kohler (1990) adds that as a result of 
this reduction, it is possible to see different phonetic realizations of the same word 
depending on the function of reduction. These changes in citation forms of words 
occur as a result of certain “temporal and articulatory” features of spontaneous 
speech, among other reasons (Hieke, 1987). 


Speakers choose to speak using features of connected speech presumably to 

save time and energy. In speaking, there is the concept of efficiency, which essentially 
tolerates the maximum elision of language patterns in an effort to minimize 

the number of phonological units (Rost, 2002). This is also known as “the principle 
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of least effort” (Zipf, 1949), or “law of economy” (Clarey & Dixson, 1963), both of 
which explain why speakers are attracted to speaking with elisions, contractions, 
and assimilations in their conversations. Describing this as a balance between 

what the speaker and the listener do during communication, Kohler states that: 
Word production is a compromise between articulatory economy for the speaker and 
acoustic distinctivity for the listener. Economy of effort in speech production is governed 
by a number of anatomical, physiological and temporal constraints in the 

speech producing apparatus that introduce directionality into reductions, such that 
they are not chaotic. Not just any changes, but only certain types are possible, which 
occur over and over again in the languages of the world and in historical sound 
change. (1990, p. 9) 


So, according to Kohler, connected speech processes are mostly a result of articulatory 
factors which may make rule-formations describing such changes possible. 

As the linguistic constraints of the languages differ, “the manifestations and 
distributions” of these changes also vary across as well as within languages. 
Among within-language factors are “the common core of linguistic context and 
context of situation in the widest sense between speaker and hearer, ranging from 
world knowledge through culture and society to the individual discourse setting” 
(Kohler, 1990, p. 10). For example, in English, the primary function of the use of 
connected speech is, in fact, to maintain the rhythm of English by “compressing” 
unstressed sounds and syllables and making articulation easier (Shockey, 2008). 
Similar examples of “compressing” can be found in many other languages. For 
example, in Turkish, in which reductions are observed, the phrase ‘gidecegiz’ 
meaning ‘we’ ll go/leave’ may optionally be pronounced as ‘gidicez’; while the 
former represents the citation form of the word, the latter is used as a variation of 
the future suffix, due mostly to a desire to save time and energy. That all of these 
alterations occur partly because of linguistic requirements but also mostly for 
communicative and pragmatic reasons poses a significant challenge for L2 listeners: 
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the difficulty of keeping up with the message while listening to this 

“reduced” speech. In other words, the more reduced a message is, the more effort 

L2 listeners have to invest in order to perceive and process the spoken text. 

One way to help L2 learners to improve their perception and production of 

connected speech forms is by constructing a comprehensive classification of these 

forms for languages, and by familiarizing learners with it. Alameen and Levis 

(2015) recently classified CSPs in English into six main categories. These are linking, 

deletion, insertion, reduction, multiple processes, and modification. Their definition of 

linking is limited to “situations in which the ending sound of one word joins the 

initial sound of the next, but only when there is no change in the character of the 

segments,” e.g. pronouncing the phrase ‘some of’ [sam av] as if it were one word 

(p. 162). Deletion includes elisions, e.g. pronouncing ‘call him’ as [kol im] by eliding 

the initial [h], and by contractions they mean pronouncing ‘he will’ as ‘he’ II’. 

For insertion, they give the example of the cartoon character Popeye’s statement of 

‘Tam what I am’ realized as ‘I yam what I yam’, in which vowels are connected by 
308 

glides at word boundaries (p. 163). Reduction involves vowel reductions in 

unstressed syllables and some consonant reductions, e.g. the lack of release of /d/ 

in the phrase ‘bad boy’ (p. 163). Under the category multiple, they mention commonly 

used lexical chunks that show several changes simultaneously, e.g. phrases 

such as ‘want to’ pronounced as ‘wanna’ or ‘going to’ pronounced as ‘gonna.’ 

Finally, the category of modification involves four subcategories: assimilation (e.g. 

the assimilation of [n] to [m] before a bilabial stop in a phrase like ‘sun beam’); 

flapping (e.g. pronouncing the alveolar stop [t] as an alveolar flap in North American 

English in the phase ‘sit around’); glottalization (e.g. pronouncing the phrase ‘can’t 

make it’ as [kan?mekit] as a result of the [t] before the nasal [m] being pronounced 

with a specific glottal articulation); and finally, palatalization, e.g. pronouncing 

‘that you’ as [datfu]. From a pedagogical perspective, having such a classification 

of connected speech is very valuable for linguists, language educators, and L2 

textbook writers, as it may help guide them in presenting and teaching these forms 

in a systematic way. 


Alameen and Levis’s classification describes connected speech processes in 
English, and many other languages also have similar processes in their phonology. 
In French, for example, a word-final voiceless consonant will be voiced when it is 
followed by a voiced segment as in the word ‘avec’ /avek/ being pronounced as 
[aveg] when it is followed by the word ‘vous’ /vu/, that is, /avek vu/ becoming 
[aveg vu] in its phonetic realization. This is an example of regressive voicing 
assimilation and is not normally found in English, and this leads French speakers 
of English to (mis)pronounce ‘nice voice’ /najs vois/ as [najz vojs] by transferring 
(negatively) the assimilation process found in French. 


It should also be pointed out that the term connected speech is not usually used to 
describe processes occurring within words (Alameen & Levis, 2015). For example, 
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the coalescent assimilation in the transformation of the word ‘face’ [fers] ~ ‘facial’ 
[ferfol] is similar to the modification in pronouncing ‘that you’ as [datfu]; however, 
while the former type of palatalization occurs within words, the latter occurs 
across word boundaries. In fact, since the processes described here are articulatorily 
easily explicable or phonetically grounded, and so are ‘natural,’ it is common 
for a language to evidence these in all applicable phonetic/phonological contexts. 
To draw from Spanish, phonological processes occur without regard to word 
boundaries. For example, there is place assimilation between a nasal consonant 
and a following obstruent, both across words (e.g. un gato ‘a cat’ pronounced as 
[un-ga-to]) and within a word (e.g. tango ‘tango (dance)’ as [tan-go]). Likewise, 
spirantization of the voiced occlusives /b, d, g/ (whereby the plosives become 
fricatives) occurs not only within words (e.g. haba ‘bean’, hada ‘fairy’, haga ‘you do 
(subj.)’ with [B, d, y]), but also across words (e.g. mi balon ‘my ball’, mi dedo ‘my 
finger’, mi gol ‘my goal’). Similarly, for syllable structure, Spanish prefers open/ 
CV syllables, such that an intervocalic consonant (VCV) is produced as the onset 
of the final syllable (V-CV), rather than the coda of the initial syllable (*VC-V); 
when a consonant-final word is uttered in connected speech before a vowel-initial 
word, the same principles of onset satisfaction and coda avoidance apply, as 
shown earlier with the example of Los alumnos estan en el aula. With regard to 
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vowel-vowel contact, Spanish likewise ignores word boundaries for determining 
maintenance of hiatus or formation of a diphthong, with the high vowels /i, u/ 
only being realized as [1, u] when stressed (or the only vowel of the syllable), being 
realized as [j, w] otherwise: siete ‘seven’ as [sje-te], bueno ‘good’ as [bwe-no] within 
words, but also si es ‘if you are’ as [sjes] and su ala ‘your wing’ as [swa-la] across 
words. Thus, while the focus of this chapter is “connected speech phenomena,” 
this is merely to emphasize the between-word nature of various natural processes 
that can impede speech processing and word recognition, and degrade listening 
comprehension, as well as being features that contribute to accentedness and 
native-likeness. 


Notions about the term advanced 


The term advanced may be interpreted and has been employed in various ways. It 

is used to refer to instructional levels that correspond to (usually collegiate) year of 
study, such that novice/beginning = first year, intermediate = second year, and 
advanced = third year. Similar terms correspond to proficiency level, with the scale 
developed by the American Council on the Teaching of Foreign Languages 

(ACTEL) being well known and highly regarded; it is composed of four main 

levels, Novice, Intermediate, Advanced, and Superior, with the first three levels further 
divided into sublevels, Low, Mid, and High (see www.actfl.org). The ACTFL scale 

is used to determine a speaker’s level based on the administration of the Oral 
Proficiency Interview (OPI). 
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Since it assesses proficiency, rather than achievement, the scale measures 

what a user can do with the language, rather than what the user knows about 

the language. High function in the target language can be important to determine, 
but it doesn’t necessarily correspond closely with mastery of phonology. 
Pronunciation/phonology per se is not really addressed or assessed by the OPI 

or ACTFL proficiency scale; that is, a speaker may be able to carry out high-level 
functions with appropriate vocabulary and grammar, yet speak with a heavy 
foreign/non-native accent. Similarly, a learner may have passive knowledge of 
L2 phonology/pronunciation, yet not be able to implement it, and still sound 

far from a native speaker. So, it may very well be the case that a student who is 
enrolled in or even has completed third-year language courses does not evidence 
‘advanced’ proficiency. (Relatedly, the student may or may not have even 
achieved ‘advanced’ knowledge/competence for some particular target structure/ 
feature.) 


Fluency is often referred to in definitions of levels of proficiency, but it is also the 
case that a learner may be quite able to mobilize language for communication and 
deliver a message effectively yet still not master either segmental (individual 
sounds) or suprasegmental (syllable structure, pitch/intonation, between-word 
linking, etc.) features of the L2; likewise, a learner may be able to speak without 
hesitations using ‘advanced’ lexis and structures, even at a fast rate of speech, yet 
not be able to decode/understand a response delivered fluently. 

310 
Finally, we might also evaluate features of connected speech themselves along 
some scale of difficulty, such that a given phenomenon may be considered ‘basic’ 
(easiest to notice and/or acquire), ‘intermediate’ or ‘advanced’ (hardest to notice 
and/or acquire), with ease of noticing (perceptual salience) also not necessarily 
corresponding with ease of production. 
One would think it would be reasonable to assume that increased noticing 
(passive knowledge) would lead to increased oral proficiency, but there are intervening 
factors of physical performance and phonetic implementation, e.g. a lifetime 
of fine-grained L1 motor habits that need to be adjusted and overcome to 
approach native-like norms. Likewise, it seems reasonable to hypothesize that 
increased noticing would lead to improved speech processing and consequently 
listening comprehension. However, these are undertested hypotheses, especially 
outside of an ESL/EFL (English as a second/foreign language) context, and much 
research along these lines remains to be conceptualized and carried out, and we 
encourage the interested and curious reader to pursue these open issues. 
Similarly, research remains to be done to determine what sorts of phonological 
units/structures/features/processes (e.g. number and/or complexity of individual 
sounds/phonemes/allophones, phonotactics/syllable types and adjustments to 
them, as well as prosodic features of stress, tone, intonation, and focus) in a 
learning context are ‘basic,’ ‘intermediate,’ or ‘advanced.’ Surely this will vary 
from one learning context to another (due to the particular L1 and L2), but likely 
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there are universals to be uncovered. And not unrelatedly, it remains to be elucidated 
which features can be most easily taught and acquired in the first-, second-, 

and third-year classroom, and beyond, as well as the ‘yield’ of various CSPs 

toward improved ability to decipher/decode in listening comprehension, and 

exactly which features most raise native-speaker perceptions/ratings of unaccentedness/ 
native-likeness. 


These are many and difficult questions to answer. The remainder of this chapter 
discusses some of these issues, and at the end of the chapter, others are given as 
areas for further study. 


Training L2 learners to perceive and produce connected speech processes 
Motivation and challenges 


Ability to perceive and produce connected speech forms has been shown to lead to 
better L2 communication (Matsuzawa, 2006; Underwood & Wallace, 2012). One of 
the earliest studies by Henrichsen (1984) compared native speakers and low-level 
and high-level learners in their comprehension of spoken English sentences in the 
absence and presence of connected speech. The findings indicated an interaction 
between proficiency level and the comprehension scores for the spoken input in 

the presence/absence of reduced forms. In other words, while advanced learners 
performed similarly to native English speakers in the absence of reduced forms, 


their comprehension scores were much closer to low-proficiency L2 learners in the 
presence of such forms. The results show the importance of familiarizing L2 
learners of all proficiency levels, including advanced learners, with CSPs for better 
L2 comprehension. 


A comprehensive study by Joyce (2011) looked at the relationship between 

linguistic knowledge, psycholinguistic subskills, and L2 listening proficiency to 
investigate the factors that may help determine the L2 aural comprehension ability 
and processing. His findings suggested that knowledge of connected speech 
processes, phonological modification knowledge, as he calls it, was one of the two most 
important subskills—the other being syntactic knowledge—playing a role in L2 
listening performance (p. 86). As an implication of his study, he encourages test 
designers to incorporate the ability to accurately process reduced forms as part of 
their goals in designing their testing tools, adding that this information could be 

used to adjust the difficulty level of a listening-test item as an indicator of proficiency 
(pp. 87-88). In fact, this is also in line with Kostin’s (2004) study which 

investigated the factors affecting the difficulty of the Test of English as a Foreign 
Language (TOEFL) dialogues and found sandhi variation as one of the phonological 
variables that most cause L2 listeners to experience difficulties in comprehension. 
Thus, it is clear that being acquainted with connected speech features is 
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highly important in understanding native or highly proficient speakers of an L2 in 

a variety of contexts, including high-stakes testing conditions. Consequently, paying 
equal attention to all aspects of an L2 education is central to helping learners 
advance their foreign/second language skills. 


These studies show that teaching and learning connected speech forms is crucial. 
Immersion in the L2 learning environment is one way to enable learners to 
familiarize themselves with features of connected speech by intensively experiencing 
the language in authentic oral contexts. However, learning by such exposure 

may require relatively extensive periods of time, which may not be feasible for 

most L2 learners. L2 instruction, on the other hand, may benefit L2 learners ideally 
by providing them with opportunities to help them notice the target structures 

(see Schmidt, 1990, 1995). One challenge for learners in noticing the various 
connected speech forms is the issue of ‘perceptual saliency.’ This refers to the features 
of speech that make “certain features of the input more comprehensible, and 

thus more liable to become intake” (Henrichsen, 1984, p. 106). Since perceptual 
saliency plays a crucial role in determining the ease of learning certain L2 features, 
Henrichsen (1984) sees various connected speech processes as a disadvantage 

since such forms are not salient by nature. Therefore, teaching these forms may 

help learners to notice and eventually learn them. However, there have been various 
challenges in teaching connected speech in L2 classrooms. 


One is related to the ‘informal-only’ or ‘substandard’ view of connected speech. 
This view prevails among many teachers and listeners, as such processes have 
been claimed to occur in fast, colloquial, casual, informal, and relaxed speech (Brown & 
Kondo-Brown, 2006, p. 5; Trager, 1982; Weinstein, 1982). Nevertheless, such suppositions 
need to be reconsidered because connected speech may occur in all 
registers, including academic and formal settings (Brown, 1977, pp. 2-3; Brown & 
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Kondo-Brown, 2006, p. 5; Ito, 2006; Rogerson, 2006). In fact, Underwood (1989) 
explains how difficult it is to draw the boundaries of formality/informality as 
follows: 


... for the language learner the division is not as neat ... It frequently happens, for 
example, that a lecturer delivering a very formal lecture from a prepared set of notes 
switches to informal language when making an aside or recounting an anecdote as an 
illustration of a point just made. Or a person involved in describing a complicated 
phenomenon to a friend over coffee may switch in and out of formal and informal 
styles depending on whether he/she is describing the phenomenon or commenting 

on it. Between the extremes, there is a range of formality/informality depending on 
the social setting, the relative ages and status of the speaker and listener, their attitudes 
to each other and the topic, the extent to which they share the same background 
knowledge, and so on. (p. 14) 


In this vein, it does not seem reasonable to be too restrictive in making claims 
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regarding the contexts in which connected speech is used, which means that 
depriving L2 learners from exploring the features of connected speech might not 

be the best choice to make in a language classroom. In fact, connected speech could 
potentially help L2 learners feel sociolinguistically more advantaged, and even 
when considered as a marker of informality, it might help learners determine any 
‘switching’ between informal and formal use of language in spoken discourse 
(Underwood, 1989). Unfortunately, this aforementioned stereotypical view of 
connected speech as ‘substandard’ (Brown & Kondo-Brown, 2006, p. 5) or 
‘informal-only’ prevails among many L2 teachers and listeners, which is, in fact, 
one of the reasons why teachers tend not to teach it and learners tend not to consider 
it a priority in their English learning experience (Brown, 1977, p. 3; Gilbert, 

1995, p. 105). Instead, both teachers and students, of all L2s, need to be made aware 
of and appreciate that connected speech is indeed ‘everyday speech’ and should 

be considered the default for most circumstances. 


In addition to its perceived ‘substandard’ status, another reason for the reluctance 

to practice connected speech phenomena in an L2 classroom may be teachers’ 

lack of knowledge of and attention to these forms and of appropriate methods and 
techniques to teach them (Ito, 2006). A majority of L2 teachers are not usually 
familiar with these structures, especially in foreign language contexts, and even if 
they are, the instruction may not be systematic enough for learners to make generalizations 
(Rogerson, 2006, p. 91). These challenges are exacerbated by the lack of 

materials and of sufficient time to devote to teaching connected speech in classrooms 
simply because they are not usually part of the curriculum (Brown & 

Hilferty, 1986/2006; Henrichsen, 1984; Joyce, 2013; Rogerson, 2006; Underwood & 
Wallace, 2012). All these result in an absence of focus on these forms in foreign and 
second language classrooms, despite their significant role in L2 communication. 
However, it should be noted that these challenges will vary from language to 
language, and from instructor to instructor. With regard to the latter, teachers (or 
their curricular coordinators) may feel they have to make choices about what 
material and structures to cover in a given course, and often neglect matters of 


pronunciation, which are often minimally present and ‘marginalized’ anyway (see 
Derwing & Munro, 2005, p. 382). Additionally, as shown by previous research, L2 
instructors do not always have in-depth linguistics-oriented training or any such 
training at all (Breitkreutz, Derwing, & Rossiter, 2001; Derwing & Rossiter, 2002), 
instead being practiced in pedagogy, and are more aware of and comfortable with 
issues of vocabulary and grammar. Very frequently, the first comprehensive and 
systematic study of phonology, including CSPs, is a dedicated course in foreign 
language pronunciation that is typically taught after the third-year grammar 
sequence, or in beginning graduate-level coursework in linguistics. Students often 
comment that they wish they had been exposed to this knowledge earlier in their 
learning, and instructors often comment that even so-called advanced learners 
have fossilized in pronunciation, given the late exposure (if at all) to in-depth 
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knowledge and practice, and the cumulative time of language learning while they 
were often left to develop unintelligible speech patterns, and were not aware of 
negative transfer from their L1. However, this will vary some according to the 
particular L2. It is very true of Spanish, in part because most learners of L2 Spanish 
often suffer from the misconception that Spanish is pronounced as it’s written, and 
written as it’s pronounced, both of which notions are false. In French, on the other 
hand, at least some CSPs (and other phonological phenomena) are taught from 
early stages because of the depth of French orthography, that is, because it is patently 
obvious from the start that French spelling does not correspond very closely/ 
shallowly with the spoken form, and that if a learner pronounces French as it is 
written, it will be largely incomprehensible, and if learners expect to recognize 
words in spoken French, they must realize that the sound-spelling mapping is a 
very complex one. 


Looking at the previous research (see Henrichsen, 1984; Joyce, 2011; Kostin, 

2004) that reveals the essential role that connected speech plays in L2 comprehension 
and communication, it can be maintained that CSPs are worthy of study and 

practice, being central to successful communication. 


Effects of training and individual differences 


The teachability and the effects of instruction on connected speech comprehension 
and perception have been systematically investigated in several studies 

(Brown & Hilferty, 1986/2006; Crawford, 2005; Matsuzawa, 2006), and the findings 
showed an improvement in learners’ listening skills in connected speech. 

One of the pioneering studies looking at connected speech, by Brown & Hilferty 
(1986/2006), investigated the effects of four weeks of daily 10-minute instruction 
on connected speech in L2 English. Their findings suggested an improvement in 
Chinese university students’ connected speech abilities on a dictation and an 
integrated grammar test, but there was no improvement on the general listening 
comprehension test. A more recent study by Matsuzawa (2006) showed that 
instruction on reduced forms can in fact lead to improvement in L2 listening 
ability based on the results of a cloze-test. Similar studies are needed for other 
languages. 


Limited previous research on the production aspect suggests that the 

production of connected speech features improves over time when such features 
are practiced in a traditional classroom context (Underwood & Wallace, 2012) as 
well as when using computer-assisted language learning (Yang, Lin, & Chung, 


314 


2009). Underwood and Wallace (2012), who looked at both production and comprehension, 


also found a significant improvement in Japanese learners’ connected 

speech comprehension and their self-confidence in conversational ability following 
10 weekly instructional periods. Although the findings showed significant 
improvement in both production and comprehension, there was no correlation 
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between learners’ ability to comprehend reduced forms in a listening test and their 
production in a spontaneous peer conversation. Similarly, Alameen (2014) looked 

at the effects of different instructional methods on the ability to perceive and produce 
linking as a connected speech phenomenon in L2 English. Her results showed 

no significant improvement on the perceptual ability based on the results of the 
dictation test; however, significant values were reached in the ability to produce 
linking. Another study, by Kennedy, Blanchet, and Trofimovich (2013), investigated 
gains by L2 learners of French in segmental production, stress, intonation, liaison, 
and enchainement, in addition to learner awareness as measured by learning diaries. 
The findings indicated no post-instructional gains except for segmental production. 
However, it should be noted that there seems to be a disagreement among 
researchers and language practitioners as to whether or not producing CSPs 

should be the focus in classrooms. Norris (1993, 1995) suggests that the purpose 

for learners should be the recognition of connected speech features in order to 
communicate well, rather than imitating native speakers’ use of connected speech 
features in learners’ own speech. Brown (1977) also explicitly disapproves of 
teaching students to “produce” these “assimilated” or “elided” forms because 
“sophisticated students who have been taught to be aware of these forms will introduce 
them into their own speech in a natural context when they feel able to control 

them” (pp. 156-157). However, she finds “the failure” to understand these forms 

as “disastrous for any student who wants to be able to cope with a native English 
situation” (p. 157).3 This being said, advanced L2 learners should be able to naturally 
make such forms a part of their utterances if they are taught how to perceive, 

and ideally produce, these forms in a classroom context. The literature mostly 

agrees on the prominence of teaching the perception and comprehension of 
connected speech features more than the production of them, mainly because the 
primary goal of pronunciation teaching is considered to be accuracy in perception 
and comprehension, followed by production. 


Previous literature has shown that L2 learners mostly benefit from instruction 
in helping them make progress on improving their perception and production of 
connected speech forms. However, learners vary a great deal. There is always variation 
in their gains in an L2 classroom in terms of their pace of learning and ultimate 
achievement. Learner backgrounds and learner conditions, as well as other 
individual differences in cognitive abilities, have been shown to influence the 
learning process. According to Ellis (2004), some of the most commonly researched 
factors include language aptitude, learning styles, motivation, anxiety, personality, 
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proficiency, learner beliefs, learning strategies, and intelligence and memory, 
among others. There are many other factors, such as L1 transfer, linguistic 
background, and age of exposure or length of residence, that also affect the learning 
process. In fact, studies looking at CSPs in relation to individual variables are very 
scarce. Some of these looked at study abroad, learner awareness, phonological and 
cognitive skills, and listening conditions. In order to identify the contribution of 
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certain phonological skills to connected speech perception, Wong and his colleagues 
(2017a) gave intermediate-to-advanced L1 Chinese learners of L2 English 

a battery of tasks to test their spoken word discrimination, part-word recognition, 
phoneme awareness, receptive vocabulary, and phonological memory. The findings 
revealed that while receptive vocabulary and part-word recognition predicted 
connected speech perception, phonemic awareness and phonological memory 

only did so indirectly through the mediator of part-word recognition. The spoken 
word recognition task, on the other hand, was not found to be a factor in explaining 
the individual differences in the comprehension of the reduced forms in this 

study. In a related study, Wong et al. (2017b) this time looked at the effects of 
varying listening conditions (multi-talker babble noise, speech-shaped noise, 
factory noise, whispering, and sad emotional tones) on connected speech perception 
by Chinese L2 learners of English. Learners were found to have more difficulty 
recognizing connected speech forms under noisy (as opposed to noise-free) 
conditions, with multi-talker babble noise creating the greatest challenge. 


A study by Kim (1995) investigated the role of attention in understanding 

speech at different speech rates and found that listeners paid less attention to 
speech when it was read at a faster rate. Therefore, according to him, listeners 
should be “encouraged to move from a more lexical mode [...] to a more syntactic 
mode” (p. 78) because this way, they would be able to comprehend connected 
speech processes occurring across word boundaries, which would otherwise go 
unnoticed (Ito, 2006, p. 23). These findings indicate that training learners to ‘notice’ 
particular forms might prove helpful not only for L2 learning in general, but also 
specifically for learning connected speech forms (Kim, 1995). Another study, by 
Gokgoz-Kurt (2016), showed how attention control relates to improvement in 
connected speech perception, specifically, word-boundary palatalization in L2 
English, following online training sessions. Results indicated that L2 learners of 
English benefited from online training and there was a significant relationship 
Between learners’ attention control and phonological learning, which shows the 
crucial role attention control plays in learning connected speech. 


In his small-scale study, Simoes (1996) looked at the effects of study abroad on 
pronunciation of five L2 Spanish learners. Only two participants were found to 

have improved on several aspects of their pronunciation, with their use of linking 
between words being one of them. Kennedy and Blanchet’s (2014) study investigated 
how L2 French learners’ improvement in connected speech perception was 

related to their language awareness. Following various activities geared to 

practicing connected speech perception (e.g. linking) and raising their awareness, 
learners who focused more on “how to use that knowledge to extract meaning 


from speech” rather than rehearsing knowledge improved more on perception. 
Finally, Ernestus, Kouwenhoven, and Van Mulken’s (2017) study examined the 
role of phonotactic constraints and native language in L2 listeners’ interpretation 
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of reduced vowels in casual speech, and their findings show a direct effect of L1 
phonotactic rules, but this effect decreases as the proficiency level increases. 
The studies summarized so far contribute to our understanding of CSPs in 
various ways, but more studies are needed to explore individual differences as 
correlates of connected speech processes. 


Suggestions for future research 

Much work needs to be done to better understand the perception and production 

of connected speech at advanced levels of proficiency, as well as the efficacy of 
pedagogy, and there is much more we could learn from innovative research 

methods. While most research in connected speech has been carried out using similar 
methods and tools, various technological tools would be useful to continue to 
explore and develop, and various research topics merit much further attention. In 
what follows, some of these are fleshed out further, while others remain for other 
investigators to elaborate. 


Instruction and training 


The effects of instruction and training on CSPs have been investigated in various 
learning contexts; however, not many empirical studies have looked at the effects 
of proficiency level in learning these forms, or those which did so did not have any 
significant differences in learning outcomes, which can possibly be due to the low 
number of participants representing each proficiency group in the respective 
studies (e.g. Gokgoz-Kurt, 2016). Further studies might consider including a balanced 
number of people from a variety of proficiency levels as this may reveal to 

what extent each group of learners benefits from instruction. Empirical studies 
may test whether and how the classification of CSPs (see Alameen & Levis, 2015) 
may be useful for different proficiency groups. Some appropriate questions to ask 
are: What may be the best time to introduce CSPs in the L2 learning process? How 
can a further classification of different CSPs be made based on the readiness of 
different proficiency levels? What types of instructional methods are most 
beneficial? 


Moreover, research investigating the perception and production of different 
aspects of pronunciation are not scarce, but there are very few studies looking at the 
interaction between perception and production of connected speech (e.g. Alameen, 
2014), which calls for additional study. Considering proficiency level, is there a 
better way to make CSPs more accessible and comprehensible for different proficiency 
levels? Is there a preferred or more effective order when it comes to teaching 
connected speech perception and production to low- versus high-proficiency 
learners? More studies to investigate related questions would contribute much 
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to general understanding of CSPs in relation to proficiency levels and eventually 
help improve pedagogical methods and materials. 
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Another interesting avenue for research is the use of technology in learning 

and teaching CSPs. Several studies have used some form of technology in their 
investigation of various CSPs (e.g. Alameen, 2014; Gokgoz-Kurt, 2016; Yang 

et al., 2009); however, technology has more to offer in teaching connected speech 
perception and production. In search of effective and practical ways to learn, 
teach, and assess CSPs, more studies are needed using acoustic analyses for 
visual feedback, speech recognition, and other exciting possibilities technology 
offers, besides those which compare computer-assisted versus traditional 
methods of teaching CSPs. Although all proficiency levels will benefit from the 
use of technology in learning connected speech, the reality is that advanced 
learners will most likely be more motivated to put time and effort into using 
technology to learn these forms compared to low-proficiency-level learners. This 
is because low-proficiency learners may prefer to use technology to understand 
‘be going to’ as a pure grammatical point before, if not simultaneously, practicing 
‘gonna’ [gana]. Additionally, technology may be a good means to help learners 
to work autonomously in their connected speech learning process. This is especially 
true for advanced L2 learners, as shown by previous research in other 

aspects of pronunciation (Mantini, 1980). In addition to classroom learning 
studies, more studies investigating naturalistic L2 connected speech learning are 
also very much needed. That way, it might be possible to investigate the effects 
of instruction versus mere exposure to target forms, which may lead researchers 
to make stronger claims regarding the sources of improvement. 


Individual learner differences 


In fact, examining individual learner differences is what will give us insight into 
the interaction of proficiency level and the teaching and learning of CSPs. So far, 
studies exploring cognitive and affective factors in relation to connected speech 
learning have looked at attention (Gokgoz-Kurt, 2016; Kim, 1995), learner awareness 
(Kennedy & Blanchet, 2014), L1 phonotactics and proficiency (Ernestus et al., 
2017), listening conditions (Wong et al., 2017b), study abroad (Simoes, 1996), and 
other factors such as spoken word discrimination, part-word recognition, phoneme 
awareness, receptive vocabulary, and phonological memory (Wong et al., 
2017a). However, not all these studies found a considerable relationship or interaction 
between them, and more importantly not all looked specifically at advanced 
learners or proficiency levels. We should continue to survey these factors in a 
deeper and extensive way, varying the learning contexts, the time and nature of 
instruction and/or contact, and learner profiles. What factors better predict 
connected speech learning in advanced L2 learners? To what extent does aptitude 
account for successful learning/failure in connected speech learning in advanced 
L2 learners given the same conditions? There are many other factors, such as 
learning strategies, cognitive styles, L1 transfer, age of exposure, or length of 
residence, which affect the learning process and constitute avenues for further 
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research. Also, advanced learners’ and instructors’ perceptions of (teaching) 
connected speech or their motivations or learning and teaching strategies may also 
be investigated. 


Research methods and tools 

In order to uncover the nature and interaction of connected speech with other 
factors, some of which were enumerated above, we need accurate and innovative 
methods and tools to measure connected speech. Dictation tests and cloze-tests 
have served as the most commonly used techniques, but they are not without 
shortcomings. Dictation tests are hard to design because the test designer needs to 
make sure that the lexical and syntactic structures that make up the test are not 
above the proficiency level of the test takers, as this would otherwise undermine 
the results of the test. In a cloze-test, on the other hand, test takers do not need to 
keep the whole sentence in mind as it is being dictated, as they only need to focus 
on one or two missing words. This means a greater burden is placed on working 
memory when taking a dictation test than when taking a cloze-test. In a cloze-test, 
since learners see the rest of the sentence and know how many words there are in 
each sentence, their likelihood of guessing the missing word(s) is higher (Joyce, 
2013). Possibly to avoid such a problem, Matsuzawa used a different type of clozetest, 
in which the number of words missing was indicated in each sentence with 
blanks—three blanks meant three words—and did this for both the target and the 
non-target words. This alternative may help to a certain extent to address the 
problem of guessing words from context; however, before administering such a 
test, respondents should be informed about what exactly corresponds to a ‘word’ 
or whether a contracted form is counted as one or two words. Yet another 
alternative method to test connected speech perception is having forced-choice 
tests in which similar sounding options are presented to learners to choose from 
(see Gokgoz-Kurt, 2016). Using such a test relatively decreases the burden on the 
working memory and eliminates the problem of having to write using correct 
spelling or of having to understand all the lexical and syntactic structures, making 
it more suitable for testing aural perception rather than listening comprehension. 
However, there are specific challenges to preparing a forced-choice test for the 
assessment of connected speech. Although they vary depending on the type of 
CSPs, among these challenges are coming up with similar sounding yet contrasting 
pairs of grammatical phrases while adhering to the language-specific phonological 
constraints such as stress or limiting the number of cues which inadvertently 

help learners guess the right option. Therefore, in order to have a more thorough 
understanding of connected speech phenomena, better techniques and tools need 
to be developed. A good starting point could be designing mixed-method studies 
using two or more types of tests simultaneously, and then comparing the results of 
each tool to see if they would yield similar findings. This may provide us with 
alternatives to assess connected speech in a more structured way, paving the way 
for CSPs to constitute a larger part of L2 curriculum and instruction. Previous 
studies have also suggested better ways to assess connected speech production. 


Authors’ version**For the original chapter, please visit 
https://onlinelibrary. wiley.com/action/showCitFormats ?doi=10.1002%2F9781119261650.ch16 


319 
Designing studies which use speech recognition technology to provide feedback 
on connected speech production is one of them (Alameen & Levis, 2015). However, 
since it is already hard to recognize connected speech usages in natural speech, 
developing some technology to identify them in an effective way seems to be a 
challenging yet interesting task to achieve. 


Finally, a complete picture of L2 connected speech learning in advanced L2 learners 
is only possible through a better understanding of the role of instruction and 
individual learner differences. Hence, further studies should look at the predictors of 
better connected speech perception and production, which could help researchers 
gain deeper insight into the factors underlying L2 phonological acquisition, and what 
it takes to reach truly ‘advanced’ standing in second language acquisition. 
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NOTES 

1 A 1999 Tribune Media Group comic manipulates connected speech processes (CSPs) for 
humorous effect: 

South Carolina US Senator Strom Thurmond to incoming US Supreme Court Chief 
Justice William Rehnquist: 

DO YEWSOLMLY SWEAH DUPHOLE DEECONSTOOSHIN ADEEYOONATTID 
STAYSUVAMECKA SAHEPYAGOD? 

(‘Do you solemnly swear to uphold the Constitution of the United States of America, 

so help you God?’) 

Rehnquist, baffled, responds: Whatever you say. 

Here, segmental and syllabic reduction, palatalization, and other segmental differences 
occurring in connected speech (here, specifically a regional variety of Southern English) 
blur word boundaries and may cause confusion for the untrained listener. 


2 Another prominent framework is that of the Council of Europe, whose Common European 
Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR; Council of 
Europe, 2001) assigns learners to the categories of Proficient User (C2, C1), Independent 
User (B2, B1), and Basic User (A2, A1). Fluency is a criterion, but not pronunciation. 


3 However, it should be noted that while some learners of L2 English do not need to 
follow native-speaker norms in learning English, there are others who are motivated to 
learn and speak native-like L2 English. In addition, although learning and teaching 
connected speech forms may not be considered a part of ELF (English as a lingua franca, 
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that is, English spoken as a common language among non-native speakers of English), 


Jenkins (2000, 2004) suggests that one should improve their ability to understand these 
forms if they are expecting to have considerable contact with native speakers. 
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